首页> 外文OA文献 >Long-term Recurrent Convolutional Networks for Visual Recognition and Description
【2h】

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

机译:用于视觉识别和识别的长期递归卷积网络   描述

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Models based on deep convolutional networks have dominated recent imageinterpretation tasks; we investigate whether models which are also recurrent,or "temporally deep", are effective for tasks involving sequences, visual andotherwise. We develop a novel recurrent convolutional architecture suitable forlarge-scale visual learning which is end-to-end trainable, and demonstrate thevalue of these models on benchmark video recognition tasks, image descriptionand retrieval problems, and video narration challenges. In contrast to currentmodels which assume a fixed spatio-temporal receptive field or simple temporalaveraging for sequential processing, recurrent convolutional models are "doublydeep"' in that they can be compositional in spatial and temporal "layers". Suchmodels may have advantages when target concepts are complex and/or trainingdata are limited. Learning long-term dependencies is possible whennonlinearities are incorporated into the network state updates. Long-term RNNmodels are appealing in that they directly can map variable-length inputs(e.g., video frames) to variable length outputs (e.g., natural language text)and can model complex temporal dynamics; yet they can be optimized withbackpropagation. Our recurrent long-term models are directly connected tomodern visual convnet models and can be jointly trained to simultaneously learntemporal dynamics and convolutional perceptual representations. Our resultsshow such models have distinct advantages over state-of-the-art models forrecognition or generation which are separately defined and/or optimized.
机译:基于深度卷积网络的模型主导了最近的图像解释任务。我们研究了周期性或“暂时深度”的模型对于涉及序列(视觉或其他)的任务是否有效。我们开发了一种适用于端到端可训练的大规模视觉学习的新颖循环卷积体系结构,并演示了这些模型在基准视频识别任务,图像描述和检索问题以及视频旁白挑战方面的价值。与当前模型假定一个固定的时空接受域或简单的时间平均以进行顺序处理相反,循环卷积模型是“双重深度”,因为它们可以在空间和时间“层”中构成。当目标概念复杂和/或训练数据有限时,此类模型可能具有优势。当非线性被合并到网络状态更新中时,学习长期依赖性是可能的。长期的RNN模型很吸引人,因为它们可以直接将可变长度的输入(例如视频帧)映射到可变长度的输出(例如自然语言文本),并且可以对复杂的时间动态建模;但是可以通过反向传播对其进行优化。我们的循环长期模型直接连接到现代视觉卷积模型,可以共同训练以同时学习时空动力学和卷积感知表示。我们的结果表明,与单独定义和/或优化的最先进的识别或生成模型相比,此类模型具有明显的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号